An active bilingual lexicon for machine translation

نویسندگان

  • Igal Golan
  • Shalom Lappin
  • Mori Rimon
چکیده

An approach to tile Transfer phase of a Machine Translat ion system is presented, where the bilingual lexicon plays an active role, guiding Transfer by means of executable descriptions of word senses. The means for lexical sense specification are, however, general enough and can in principle apply to other system arthitectures, e.g. in tile Generation phase if Transfer is intentionally kept minimal. The active lexicon is the one and only systea~ componen t which is exposed to users and can serve to linguistically control Transfer effects. A unified approach to lexicon creation and maintenance is proposed, which contains means to gradually refine sense specification and tailor the definitions to specific text domains. The underlying linguistic principles, the nature of sense distinction required tot translation, and tilt: formal structure of the lexicon are discussed. I° h t t r o d u c t i o n While melbods of monolingual Analysis and Generation are also treated in other contexts, bilingual Transfer problems are hardly inw~stigated outside the context of Machine Translation. Research in Machine Translation can, in this case, make a specific contribution to Computat ional l,inguistics. The general issue here is tire formal representation of phrase structures and lexical units and t h e methodology for specifying transformations between these representations in two (or more) languages. The role of tile bilingual lexicon in the Transfer activity, attd its power to assist in the resolution of mapping problems, is a key element. In tills paper we present an approach to the formal representation of bilingual lexical knowledge and to the way this knowledge is incorporated into the translation process. In section 2 we describe the role and place of the bilingual lexicon in the translation process, present the concept of executable descriptions of word senses as lexical definitions, and discuss some aspects of practical usage. Our approach to the sense distinction required for translation, which is different f rom monolingual sense distinction, is discussed in section 3. In section 4 we make a few methodological comments , arguing that wha! is often portrayed as the ideal Transfer-based architecture, is not the only, and not necessarily the best way to achieve modularity and save work. Section 5 contains a formal definition of the lexicon specification language with some discussion of its features and the intended restrictions on the power given to the lexicographer. Finally, an additional example is given in detail in section 6. This work has been carried out as part of the M E N T O R project, where several groups in European IBM Scientific Centers are collaborating on M(A)T research. The approach presented here has been developed and prototyped by the group in Ilaifa, Israel, as part of the proposal for tire design of Transfer-related operations. The examples below involve translations from English into I lebrew. We thank our colleagues Danit Ben-Ari, Esther Bentur and Maria Vilkuna for their contributions and comments . 2. "lhe Role and Content of the Bilingt, al Lexicon [Cullingford 87] describes an MT system which is purely lexicon driven. Il ls system follows the Conceptual Processing model, and is not Transfer-based, hence the emphasis there is on deep Analysis and Generation. Many other systems distinguish between I.exical Transfer and Structural Transfer, but they take different approaches to the actual separation of these two sub-processes. In the work reported here, an attempt was made to strictly separate lexicon-driven selection of target language equivalents from the global mapping of syntactic structures in the SIA into those of tile 'I'I~ (cf. [Biewer 85]). ' lhe lexicon lookup phase, which takes place before phrase structure transformations, gets as its input the internal data representation provided by the SL parser (PEG [Jensen 86], in out prototype). The terminal nodes (leaves of a parse tree) are searched in a pre-defined order for certain parts of speech. For each word in turn a target equivalent is selected from the bilingual lexicon and attached to the corresponding node in the parse. Features may also be added to other affected nodes. f lowever, no structural modifications are made. Structural t ransformations are carried out as an independent sub-process, upon completion of tire bilingual lexical phase, and are not discussed in this paper. Since in many cases, and ill fact for most verbs, several alternative translations exist, tile selection is done by texical differentiation rules. These rules refer to the syntactic environment of tile word in the parse tree and to a limited number of semantic features. The rules can access any node and attribute identified by the parser. Given that the rules are stated in terms of the SL phrase structure, it seems more natural to apply them as close to Analysis as possible. Nevertheless, the sense distinction cannot be done as part of SL Analysis itself, as in many cases it depends on factors which may vary from one TL to another. The sub-process of bilingual lexical substitution proceeds unidirectionally. No iterations take place for any given phrase. In some cases this may require extensive searching of' the phrase The filllowing abbreviation~ are u~ed throughout thi~ pal~er: S L =: Sou rce L a n g u a g e , B L = BiLingua l , T L = T a r g e t Language

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Performance of an Example-Based Machine Translation System Using a Domain-specific Bilingual Lexicon

In this paper, we study the impact of using a domain-specific bilingual lexicon on the performance of an Example-Based Machine Translation system. We conducted experiments for the EnglishFrench language pair on in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents), and we compared the results of the Example-Based ...

متن کامل

Corpus-Driven Bilingual Lexicon Extraction

This paper introduces some key aspects of machine translation in order to situate the role of the bilingual lexicon in transfer-based systems. It then discusses the data-driven approach to extracting bilingual knowledge automatically from bilingual texts, tracing the processes of alignment at different levels of granularity. The paper concludes with some suggestions for future work. 1 Machine T...

متن کامل

Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text

A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences.

متن کامل

-1 - Machine Translation without a Bilingual Dictionary

This paper outlines experiments conducted to determine the contribution of the traditional bilingual dictionary in the automatic alignment process to learn translation patterns, and at runtime. We found that by using automatically derived translation word pairs combined with a function word only lexicon, we were able to either match or nearly match the translation quality of the system that use...

متن کامل

Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language

This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...

متن کامل

Mining a Bilingual Lexicon of MultiWord Expressions : A Statistical Machine Translation Evaluation Perspective (Acquisition de lexique bilingue d'expressions polylexicales: Une application à la traduction automatique statistique) [in French]

Mining a Bilingual Lexicon of MultiWord Expressions : A Statistical Machine Translation Evaluation Perspective This paper describes a method aiming to construct a bilingual lexicon of MultiWord Expressions (MWES) from a French-English parallel corpus. We first extract monolingual MWES from each part of the parallel corpus. The second step consists in acquiring bilingual correspondences of MWEs....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988